Unsupervised vocal-tract length estimation through model-based acoustic-to-articulatory inversion
نویسندگان
چکیده
Knowledge of vocal-tract (VT) length is a logical prerequisite for acoustic-to-articulatory inversion. Prior work has treated VT length estimation (VTLE) and inversion largely as separate problems. We describe a new algorithm for VTLE based on acoustic-to-articulatory inversion. Our inversion process uses the Maeda model (MM, [1,2]) and combines global search [3] and dynamic programming for transforming speech waveforms into articulatory trajectories. The VTLE algorithm searches for the VT length of MM that generates the most accurate and smooth inversion result. This new algorithm was tested on samples of non-nasalized diphthongs (e.g., [ai]) synthesized with MM itself, with TubeTalker (a different VT model, [4]) and collected from children and adult speakers; its performance was compared with that from a conventional formant frequency-based method. Results of VTLE on synthesized speech indicate that the inversion-based algorithm led to greater VTLE accuracy and robustness against phonetic variation than the formant-based method. Furthermore, compared to the formant-based method, results from the inversion-based algorithm showed stronger correlation with a MRI-derived VTL measure in adults and greater consistency with formerly reported age-VTL relations in children [5].
منابع مشابه
Evaluation of speech inversion using an articulatory classifier
This paper presents an evaluation method for statistically based speech inversion, in which the estimated vocal tract shapes are classified into phoneme categories based on the articulatory correspondence with prototype vocal tract shapes. The prototypes are created using the original articulatory data and the classifier hence permits to interpret the results of the inversion in terms of, e.g.,...
متن کاملAn acoustic analysis of lion roars. II: Vocal tract characteristics
This paper makes the first attempt to perform an acoustic-to-articulatory inversion of a lion (Panthera leo) roar. The main problems that one encounters in attempting this, is the fact that little is known about the dimensions of the vocal tract, other than a general range of vocal tract lengths. Precious little is also known about the articulation strategies that are adopted by the lion while ...
متن کاملRecovering vocal tract shapes from MFCC parameters
Recovering vocal tract shapes from the speech signal is a well known inversion problem of transformation from the articulatory system to speech acoustics. Most of the studies on this problem in the past have been focused on vowels. There have not been general methods e ective for recovering the vocal tract shapes from the speech signal for all classes of speech sounds. In this paper we describe...
متن کاملA Rough Guide to the Acoustic-to-articulatory Inversion of Speech
| This article reviews a speci c speech research area called acoustic-to-articulatory inversion of speech, or speech inversion, which refers to the problem of mapping the acoustic speech signal onto a space describing the conguration of the human vocal tract that actually produced this signal. This space may be modeled in a variety of ways, such as with trajectories of the movement of the artic...
متن کاملMethods for Integrating Phonetic and Phonological Knowledge in Speech Inversion
Exploiting the information about the vocal tract shape that produced the speech has been appealing to speech researchers and scientists for a long period of time. Experimental studies that included the articulatory information from physiological measurements supported the idea that this information could be useful in a number of areas of speech science and technology. However, the estimation of...
متن کامل